Skip to content

fix: use one big vnet and attach AKS clusters to it to avoid creating bastion multiple times#8646

Open
awesomenix wants to merge 1 commit into
mainfrom
nishp/fast/prefetchopt
Open

fix: use one big vnet and attach AKS clusters to it to avoid creating bastion multiple times#8646
awesomenix wants to merge 1 commit into
mainfrom
nishp/fast/prefetchopt

Conversation

@awesomenix
Copy link
Copy Markdown
Contributor

  • Make everyone's life simple
  • README has all the details, but idea is to use shared VNET and stop doing unnecessary activities multiple times.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the E2E harness to use a single shared per-location VNet and Bastion (in abe2e-{location}) and attaches all AKS clusters to per-cluster subnets inside that VNet, reducing repeated Bastion provisioning and centralizing shared infrastructure creation.

Changes:

  • Add shared-infrastructure provisioning (shared_infra.go) to create/ensure a shared VNet, Bastion, Firewall, and managed identity, plus auto-allocation of per-cluster subnets.
  • Update cluster creation and networking helpers to rely on VnetSubnetID (BYO VNet/subnet) rather than discovering VNets in the MC_ resource group.
  • Update SSH-over-Bastion flow and documentation to reflect shared Bastion/VNet usage.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
e2e/vmss.go Switch Bastion SSH command to use shared Bastion name + shared RG.
e2e/shared_infra.go Introduce shared VNet/Bastion/Firewall/identity creation and per-cluster subnet allocation/cleanup.
e2e/README.md Document the new shared-infra architecture and updated scenario authoring pointers.
e2e/node_config.go Adjust tenant ID derivation to handle user-assigned identity clusters.
e2e/kube.go Read cluster subnet ID from agent pool VnetSubnetID instead of listing VNets.
e2e/cluster.go Create per-cluster subnet after name hashing; use shared Bastion; derive VNet from subnet ID.
e2e/cache.go Ensure all cached cluster creators configure the shared VNet before preparing clusters.
e2e/aks_model.go Avoid hardcoded subnet CIDRs; use parsed VNet/subnet info when configuring firewall/isolated settings.

Comment thread e2e/aks_model.go Outdated
Comment on lines 475 to 479
subnetAddressPrefix := vnet.addressPrefix

subnetParameters := armnetwork.Subnet{
ID: to.Ptr(subnetId),
Properties: &armnetwork.SubnetPropertiesFormat{
Comment thread e2e/shared_infra.go Outdated
Comment thread e2e/shared_infra.go
Comment thread e2e/shared_infra.go
Comment thread e2e/node_config.go Outdated
Comment thread e2e/README.md
Copilot AI review requested due to automatic review settings June 5, 2026 20:54
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from 2579b61 to 738f4ae Compare June 5, 2026 20:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Comment thread e2e/aks_model.go Outdated
Comment thread e2e/shared_infra.go
Comment thread e2e/shared_infra.go
Comment thread e2e/node_config.go Outdated
Comment thread e2e/shared_infra.go
Comment thread e2e/README.md
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from 738f4ae to 69c05f8 Compare June 5, 2026 21:00
Copilot AI review requested due to automatic review settings June 5, 2026 23:02
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from 69c05f8 to f928610 Compare June 5, 2026 23:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Comment thread e2e/shared_infra.go
Comment thread e2e/shared_infra.go
Comment thread e2e/cluster.go
Comment thread e2e/node_config.go Outdated
Comment thread e2e/aks_model.go
Comment thread e2e/aks_model.go
Comment thread e2e/README.md Outdated
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from f928610 to be31cd4 Compare June 6, 2026 00:55
Copilot AI review requested due to automatic review settings June 6, 2026 13:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Comment thread e2e/shared_infra.go
Comment thread e2e/shared_infra.go
Comment thread e2e/aks_model.go Outdated
Comment thread e2e/node_config.go Outdated
Comment thread e2e/README.md
Comment thread e2e/README.md
Comment thread e2e/aks_model.go
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from e71ee5d to 8315a13 Compare June 6, 2026 15:07
Copilot AI review requested due to automatic review settings June 6, 2026 21:54
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from 8315a13 to 1bb4eab Compare June 6, 2026 21:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.

Comment thread e2e/shared_infra.go
Comment on lines +126 to +130
poller, err := config.Azure.Subnet.BeginCreateOrUpdate(ctx, rg, SharedVNetName, PESubnetName, armnetwork.Subnet{
Properties: &armnetwork.SubnetPropertiesFormat{
AddressPrefix: to.Ptr(PESubnetCIDR),
},
}, nil)
Comment thread e2e/shared_infra.go
Comment on lines +153 to +155
for _, subnet := range page.Value {
name := *subnet.Name
if !strings.HasPrefix(name, "aks-subnet-") {
Comment thread e2e/shared_infra.go
Comment on lines +434 to +437
for _, s := range page.Value {
if s.Properties != nil && s.Properties.AddressPrefix != nil {
usedCIDRs[*s.Properties.AddressPrefix] = true
}
Comment thread e2e/shared_infra.go
Comment on lines +359 to +361
// ensureClusterIdentity creates a user-assigned managed identity for AKS clusters
// and grants it Network Contributor on the subscription so it can manage route tables
// in both the shared VNet and the MC_ resource groups.
Comment thread e2e/cluster.go
Comment on lines +557 to +558
toolkit.Logf(ctx, "using shared bastion %s in %s", SharedBastionName, sharedRG)
return NewBastion(config.Azure.Credential, config.Config.SubscriptionID, sharedRG, *sharedBastion.Properties.DNSName), nil
Comment thread e2e/aks_model.go
Comment on lines +599 to +605
if zone.Name == nil || *zone.Name != privateZoneName {
continue
}
zoneRG := resourceGroupFromID(*zone.ID)
if strings.EqualFold(zoneRG, sharedRG) {
continue
}
Comment thread e2e/aks_model.go
Comment on lines +630 to +631
toolkit.Logf(ctx, "deleting conflicting DNS zone link %s in %s/%s (points to shared VNet)", *link.Name, zoneRG, zoneName)
poller, err := config.Azure.VirutalNetworkLinksClient.BeginDelete(ctx, zoneRG, zoneName, *link.Name, nil)
Comment thread e2e/aks_model.go
Comment on lines +1054 to +1059
if privateEndpoint.Properties == nil || len(privateEndpoint.Properties.NetworkInterfaces) == 0 {
return fmt.Errorf("private endpoint has no network interfaces")
}

aRecords := make([]*armprivatedns.ARecord, len(ipAddresses))
for i, ip := range ipAddresses {
aRecords[i] = &armprivatedns.ARecord{IPv4Address: &ip}
}
ttl := int64(10)
aRecordSet := armprivatedns.RecordSet{
Properties: &armprivatedns.RecordSetProperties{
TTL: &ttl,
ARecords: aRecords,
},
}
_, err := config.Azure.RecordSetClient.CreateOrUpdate(ctx, nodeResourceGroup, privateZoneName, armprivatedns.RecordTypeA, *dnsConfig.Fqdn, aRecordSet, nil)
if err != nil {
return fmt.Errorf("failed to create record set: %w", err)
}
nicID := *privateEndpoint.Properties.NetworkInterfaces[0].ID
nicName := nicID[strings.LastIndex(nicID, "/")+1:]
Comment thread e2e/aks_model.go
Comment on lines +1065 to +1067
// Each NIC IP config has a private IP and an associated FQDN from the
// PE's privateLinkServiceConnections. Create one A record per FQDN.
for _, ipConfig := range nic.Properties.IPConfigurations {
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from 1bb4eab to 0648508 Compare June 7, 2026 04:04
Copilot AI review requested due to automatic review settings June 7, 2026 04:21
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from 0648508 to 7f6b2a6 Compare June 7, 2026 04:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Comment thread e2e/aks_model.go
Comment on lines +115 to 118
networkProfile.ServiceCidr = to.Ptr("172.16.0.0/16")
networkProfile.ServiceCidrs = []*string{
networkProfile.ServiceCidr,
to.Ptr("fd12:3456:789a:1::/108"),
Comment thread e2e/shared_infra.go
Comment on lines +135 to +138
poller, err := config.Azure.Subnet.BeginCreateOrUpdate(ctx, rg, SharedVNetName, PESubnetName, armnetwork.Subnet{
Properties: &armnetwork.SubnetPropertiesFormat{
AddressPrefix: to.Ptr(PESubnetCIDR),
},
Comment thread e2e/shared_infra.go
Comment on lines +389 to +392
existing, err := config.Azure.UserAssignedIdentities.Get(ctx, rg, SharedClusterIdentity, nil)
if err == nil {
return *existing.ID, *existing.Properties.TenantID, nil
}
Comment thread e2e/cluster.go
Comment on lines +557 to +558
toolkit.Logf(ctx, "using shared bastion %s in %s", SharedBastionName, sharedRG)
return NewBastion(config.Azure.Credential, config.Config.SubscriptionID, sharedRG, *sharedBastion.Properties.DNSName), nil
Comment thread e2e/cluster.go
Comment on lines +721 to +723
// Check if the record already exists with the correct IPs
existing, err := config.Azure.RecordSetClient.Get(ctx, resourceGroup, zoneName, armprivatedns.RecordTypeA, recordName, nil)
if err == nil && existing.Properties != nil && existing.Properties.ARecords != nil {
Comment thread e2e/aks_model.go
Comment on lines +1067 to +1077
for _, ipConfig := range nic.Properties.IPConfigurations {
if ipConfig.Properties == nil || ipConfig.Properties.PrivateIPAddress == nil {
continue
}
ip := *ipConfig.Properties.PrivateIPAddress

func addDNSZoneGroup(ctx context.Context, privateZone *armprivatedns.PrivateZone, nodeResourceGroup, privateZoneName, endpointName string) error {
groupName := strings.Replace(privateZoneName, ".", "-", -1) // replace . with -
_, err := config.Azure.PrivateDNSZoneGroup.Get(ctx, nodeResourceGroup, endpointName, groupName, nil)
if err == nil {
return nil
}
dnsZonegroup := armnetwork.PrivateDNSZoneGroup{
Name: to.Ptr(fmt.Sprintf("%s/default", privateZoneName)),
Properties: &armnetwork.PrivateDNSZoneGroupPropertiesFormat{
PrivateDNSZoneConfigs: []*armnetwork.PrivateDNSZoneConfig{{
Name: to.Ptr(groupName),
Properties: &armnetwork.PrivateDNSZonePropertiesFormat{
PrivateDNSZoneID: privateZone.ID,
// The PE's CustomDNSConfigs or PrivateLinkServiceConnections tell us the
// FQDN, but they may be empty. Use the IP config's
// PrivateLinkConnectionProperties for the FQDN list.
if ipConfig.Properties.PrivateLinkConnectionProperties == nil || len(ipConfig.Properties.PrivateLinkConnectionProperties.Fqdns) == 0 {
continue
- go.mod
- go.sum
- e2e/
- e2e/scenario_win_test.go
@awesomenix awesomenix force-pushed the nishp/fast/prefetchopt branch from 7f6b2a6 to 9b2d401 Compare June 7, 2026 15:07
Copy link
Copy Markdown
Collaborator

@Devinwong Devinwong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice consolidation — moving to a single per-region shared VNet/Bastion/Firewall is a big simplification and should cut Bastion cost significantly. Most of my comments are about correctness under concurrent e2e processes in the same region, since the new shared resources are by design global within a region and the existing in-process cachedFunc does not protect against another go test invocation racing on the same ARM resources.

🔴 High — cleanupOrphanedSubnets can delete an in-flight subnet of a concurrent run

e2e/shared_infra.go ensureSharedInfracleanupOrphanedSubnets

The deletion criterion is "no active resources on the subnet AND the AKS cluster doesn't exist (404)". But in prepareCluster we create the subnet before getOrCreateCluster. There is a real window where:

  • process A is creating aks-subnet-<cluster> → subnet exists, IPConfigurations empty, AKS Get returns 404 (cluster not yet PUT)
  • process B starts cold, runs ensureSharedInfracleanupOrphanedSubnets → sees A's subnet as orphaned and deletes it

This would surface as a confusing failure where A's cluster creation later can't find its subnet. Suggest one of:

  • Only delete subnets whose provisioningState=Succeeded and whose age (from a tag we set at creation, e.g., createdAt) is older than some safety threshold (e.g., 1h).
  • Or skip cleanup entirely and rely on the abe2e-{location} RG TTL/janitor.

🔴 High — ensureClusterSubnet retries the same CIDR on 409

cidr := allocateSubnetCIDR(subnetName, usedCIDRs)
...
return ensureSubnet(ctx, rg, SharedVNetName, subnetName, cidr) // wraps POST in retryOn409

CIDR allocation is outside the retry. If two processes hash-collide (or merely list usedCIDRs before the other commits), both pick the same /20 and POST. The loser gets 409, retryOn409 re-POSTs the same CIDR → 409 forever until the 10-attempt budget runs out. With backoff that's a ~50s hang for nothing.

Move the allocate+list inside the retry function, e.g.:

return retryOn409(ctx, "allocating cluster subnet", func() error {
    used, err := listUsedCIDRs(ctx, rg)
    if err != nil { return err }
    cidr := allocateSubnetCIDR(subnetName, used)
    if cidr == "" { return fmt.Errorf("no free /20") }
    return createSubnet(ctx, rg, vnet, subnetName, cidr)
})

Also: usedCIDRs is built with an exact string match on AddressPrefix. A subnet that's a /24 (or /26, like AzureBastionSubnet at 10.0.0.0/26) won't be in the set as /20, so overlap is not detected by allocateSubnetCIDR itself. Today this is safe because shared subnets all live in 10.0.0.0/16 and cluster subnets start at 10.1.0.0/20, but it's a footgun if anyone ever lowers the second-octet start. Worth a comment, or compute overlap properly with net.ParseCIDR.

🟡 Medium — ensureSharedBastion / ensureSharedFirewall / ensureClusterIdentity not idempotent across processes

These do Get → 404 → POST. cachedFunc only dedupes within one process. Two cold parallel go test invocations in the same region will both POST a new Bastion / Firewall / UAMI / role assignment. Bastion and Firewall creates are not idempotent on conflict and are also slow (5-10min), so the loser will hit either a 409 or a long timeout with no retry.

Either:

  • wrap them in retryOn409 with a Get on each retry (so the second attempt finds the now-existing resource and returns), or
  • gate them with a subscription-scoped lease (e.g., a resource group lock or a "create-claim" tag).

ensureClusterIdentity already swallows the 409 on the role assignment — same pattern needed on the UAMI create itself.

🟡 Medium — setupPrivateDNSForAPIServer now runs for network-isolated clusters too

In the old code this was only called for non-isolated clusters. The diff removes that guard implicitly (it's unconditional in the new prepareCluster DAG). For a private/network-isolated cluster, AKS RP already provisions its own private DNS for the API server FQDN in the MC_ RG. Adding a second A record in our shared hcp.<location>.azmk8s.io zone may either be harmless (different zone name from the AKS-managed one) or conflict with resolution.

Was this intentional? If yes, please add a one-line comment explaining why both records can coexist. If not, restore the if !isNetworkIsolated guard.

🟡 Medium — addRecordSetToPrivateDNSZone silently produces wrong record names when FQDN doesn't end in .azurecr.io

recordName := strings.TrimSuffix(*fqdn, ".azurecr.io")

TrimSuffix is a no-op when the suffix is absent → recordName == fqdn, and we then write an A record with a fully-qualified name as a relative label, which Azure will either reject (4xx → bubbles up) or — worse — accept as <fqdn>.privatelink.azurecr.io. Suggest:

recordName, ok := strings.CutSuffix(*fqdn, ".azurecr.io")
if !ok {
    return fmt.Errorf("PE NIC FQDN %q does not match expected .azurecr.io suffix", *fqdn)
}

🟢 Low — migration impact of cluster name version bump

cache.go bumps cluster names (v3→v4, v4→v5, v1→v2). Existing CI/dev environments still have old MC_ RGs and per-cluster bastions/PIPs/firewalls that won't be deleted by cleanupOrphanedSubnets (it only scans the shared VNet subnets). Worth either:

  • documenting a one-time az group delete cleanup step in e2e/README.md, or
  • adding a cleanupOrphanedClusterRGs pass that deletes MC_abe2e-*-{location} RGs whose cluster no longer exists.

Otherwise leftover bastions ($$$) and PIPs will linger in every region indefinitely.

🟢 Low — setupPrivateDNSForAPIServer DAG dep is implicit via MustGet

dag.Run(g, func(ctx context.Context) error {
    return setupPrivateDNSForAPIServer(ctx, vNet.MustGet().resourceGroup, cluster)
})

This works (MustGet blocks on <-r.done), but it's a hidden dependency on the vNet task. If vNet fails, MustGet panics → the recover-as-error in dag catches it, but we get a misleading panic-stack error rather than a clean cascaded cancellation. Prefer dag.Run1(g, vNet, func(ctx, v) error { return setupPrivateDNSForAPIServer(ctx, v.resourceGroup, cluster) }) to match the rest of the function's style.


The architectural direction (single shared VNet, one Bastion, shared identity, hash-allocated cluster subnets, single DNS zone with per-cluster A records) is good. Main blockers IMO are the two High items above; the rest are improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants